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1 Research track pa pers: A proba bilisti c framework for s emi-supervised clustering j 
Sugato Basu, Mikhail Bilenko, Raymond J. Mooney 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^| pdf(1 87.51 KB) Additional Information: full citation , abstract, r eferen ce s , index terms 

Unsupervised clustering can be significantly improved using supervision in the form of 
pairwise constraints, i.e., pairs of instances labeled as belonging to same or different 
clusters. In recent years, a number of algorithms have been proposed for enhancing 
clustering quality by employing such supervision. Such methods use the constraints to either 
modify the objective function, or to learn the distance measure. We propose a probabilistic 
model for semi-supervised clustering based on Hidden Mar ... 

Keywords: distance metric learning, hidden Markov random fields, semi-supervised 
clustering 
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Martin Franz, Todd Ward, J. Scott McCarley, Wei-Jing Zhu 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ^ pdf(302.02 KB) Additional Information: full cit a tion , abstract , references, index terms 

We investigate important differences between two styles of document clustering in the 
context of Topic Detection and Tracking. Converting a Topic Detection system into a Topic 
Tracking system exposes fundamental differences between these two tasks that are 
important to consider in both the design and the evaluation of TDT systems. We also identify 
features that can be used in systems for both tasks. 

3 Technical session 15: WWW im ag e retrieval: Hierarc h ical clustering of WWW ima ge Q 
search results using visual, textual and link information 

Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, Ji-Rong Wen 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Full text available: ^ pdf(1.15 MB) Additional Information: full citat i on , abstract , references , index terms 

We consider the problem of clustering Web image search results. Generally, the image 
search results returned by an image search engine contain multiple topics. Organizing the 
results into different semantic clusters facilitates users' browsing. In this paper, we propose 
a hierarchical clustering method using visual, textual and link analysis. By using a vision- 
based page segmentation algorithm, a web page is partitioned into blocks, and the textual 
and link information of an image can be accu ... 
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Xin Zheng, Deng Cai, Xiaofei He, Wei-Ying Ma, Xueyin Lin 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Full text available: ^ pdf(1.59 MB) Additional Information: full citation , abstract , references , index terms 

It is important and challenging to make the growing image repositories easy to search and 
browse. Image clustering is a technique that helps in several ways, including image data 
preprocessing, user interface designing, and search result representation. Spectral 
clustering method has been one of the most promising clustering methods in the last few 
years, because it can cluster data with complex structure, and the (near) global optimum is 
guaranteed. However, existing spectral clustering algo ... 

Keywords: image clustering, locality preserving clustering, locality preserving projections, 
spectral clustering 



5 Meeting ex perience: Experiential meeting s ystem 
Ramesh Jain, Pilho Kim, Zhao Li 

November 2003 Proceedings of the 2003 ACM SIGMM workshop on Experiential 
telepresence 

Full text available: ^ pdf(388.84 KB) Additional Information: full citation , abstract , references , index terms 

We are developing experiential meeting systems to allow people to be tele-present in a 
remote meeting and to be able to review proceedings of a meeting or of several meetings 
using all the data recorded in a meeting. We consider this as a problem in management and 
experiential access to all multimedia data acquired in a meeting. The data includes video, 
audio, presentations, text material, databases and websites related to people and the 
discussions in the meeting, and any other data or informat ... 

Keywords: data event, elemental event and domain event, event, event based data 
processing, experiential systems, meeting 



6 Image annotation and video summari zatio n: Video summarization based on u ser l o g Q 
enhanced link analysis 

Bin Yu, Wei-Ying Ma, Klara Nahrstedt, Hong-Jiang Zhang 

November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: ^ pdf(771 .50 KB) Additional Information: full citation , abstract , references , index terms 

Efficient video data management calls for intelligent video summarization tools that 
automatically generate concise video summaries for fast skimming and browsing. Traditional 
video summarization techniques are based on low-level feature analysis, which generally 
fails to capture the semantics of video content. Our vision is that users unintentionally 
embed their understanding of the video content in their interaction with computers. This 
valuable knowledge, which is difficult for computers to I ... 

Keywords: link analysis, log mining, skimming, user behavior, video content analysis, 
video summarization 
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July 2004 Proceedings of the 27th annual international conference on Research and 
development in information retrieval 

Full text available: ^ pdf(210.38 KB) Additional Information: full citation , abstract , references , index terms 

Organizing Web search results into clusters facilitates users' quick browsing through search 
results. Traditional clustering techniques are inadequate since they don't generate clusters 
with highly readable names. In this paper, we reformalize the clustering problem as a salient 
phrase ranking problem. Given a query and the ranked list of documents (typically a list of 
titles and snippets) returned by a certain Web search engine, our method first extracts and 
ranks salient phrases as candidate c ... 

Keywords: document clustering, regression analysis, search result organization 



8 QueryTracker: An Agent for Trackin g Persistent Information Needs 
Gabriel Somlo, Adele E. Howe 

July 2004 Proceedings of the Third International Joint Conference on Autonomous 
Agents and Multiagent Systems - Volume 1 

Full text available: ^ p df ( 383 .94 KB) Additional Information: full citation, abstract 

Most people have long term information interests. Current Web search engines satisfy 
immediate information needs. Specific sites support tracking of long term interests. We 
present an agent that satisfies a gap in these services. QueryTracker implements a search 
engine interface with state. A userys query and a learned alternative is automatically 
submitted daily to a search engine. A profile of the userys interest is constructed based on 
user relevance feedback. Daily search results are dissemi ... 

9 Brave new topics ~ s ession 3: the effect of benchmarking on adva n ces in semantic 
video: On the detection of semantic concepts at TRECVID 

Milind R. Naphade, John R. Smith 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Full text available: ^ pdf ( 287.92 KB) Additional Information: full citation, abstract , referenc es, index terms 

Semantic multimedia management is necessary for the effective and widespread utilization 
of multimedia repositories and realizing the potential that lies untapped in the rich 
multimodal information content. This challenge has driven researchers to devise new 
algorithms and systems that enable automatic or semi-automatic tagging of large scale 
multimedia content with rich semantics. An emerging research area is the detection of a 
predetermined set of semantic concepts that can act as semantic ... 

Keywords: NIST TRECVID benchmark, average precision, semantic concept detection 



10 T echn i ca l poster session 3: multimedia tools , en d- systems, and applications: Cortina: a Q 
s ystem for la rg e-scale , cont ent-based web image retrieval 
Till Quack, Ullrich Monich, Lars Thiele, B. S. Manjunath 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Full text available: ^ pdf( 186.03 KB) Additional Information: full citation , abstract , references , index terms 

Recent advances in processing and networking capabilities of computers have led to an 
accumulation of immense amounts of multimedia data such as images. One of the largest 
repositories for such data is the World Wide Web (WWW). We present Cortina, a large-scale 
image retrieval system for the WWW. It handles over 3 million images to date. The system 
retrieves images based on visual features and collateral text. We show that a search process 
which consists of an initial query-by-keyword or quer ... 

Keywords: MPEG-7, WWW, association rules, clustering, large-scale, online, relevance 
feedback, semantics, web image retrieval 




http://portal.acm.org/resultsxfm?CFID=W 2/21/05 



Results (page 1): +"unsupervised clustering" 



Page 4 of 6 



11 Cross-lingual C*ST*RD: English access to Hindi information 

Anton Leuski, Chin-Yew Lin, Liang Zhou, Ulrich Germann, Franz Josef Och, Eduard Hovy 
September 2003 ACM Transactions on Asian Language Information Processing (TALIP), 

Volume 2 Issue 3 

Full text available: ^E] pdf(210.61 KB! Additional Information: full citation , abstract , references , index terms 

We present C*ST*RD, a cross-language information delivery system that supports cross- 
language information retrieval, information space visualization and navigation, machine 
translation, and text summarization of single documents and clusters of documents. 
C*ST*RD was assembled and trained within 1 month, in the context of DARPA's Surprise 
Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given 
the brief time, we could not create deep Hindi capabilities for all th ... 

Keywords: Cross-language information retrieval, Hindi-to-English machine translation, 
headline generation, information retrieval and information space navigation, single- and 
multi-document text summarization 



1 2 Scalable feature selection, classificati on an d sign ature g eneration for or ganizing large Q 
text datab ases int o hierarchical topic tax on o mies 
Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan 
August 1998 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 7 Issue 3 

Full text available: ^g) pdf(281.37 KB ) Additional Information: full citation , abstract , citings , index terms 

We explore how to organize large text databases hierarchically by topic to aid better 
searching, browsing and filtering. Many corpora, such as internet directories, digital 
libraries, and patent databases are manually organized into topic hierarchies, also called 
taxonomies. Similar to indices for relational data, taxonomies make search and access more 
efficient. However, the exponential growth in the volume of on-line textual information 
makes it nearly impossible to maintain such taxono ... 



13 Full Technical Papers: Learnin g im plicit user interest hierarchy for context in 
personalization 
Hyoung R. Kim, Philip K. Chan 

January 2003 Proceedings of the 8th international conference on Intelligent user 
interfaces 

Full text available: ^ pdf(191.53 KB) Additional Information: full citation, abstract , references , index terms 

To provide a more robust context for personalization, we desire to extract a continuum of 
general (long-term) to specific (short-term) interests of a user. Our proposed approach is to 
learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a 
divisive hierarchical clustering (DHC) algorithm to group words (topics) into a hierarchy 
where more general interests are represented by a larger set of words. Each web page can 
then be assigned to nodes in the hierarchy f ... 

Keywords: clustering algorithm, concept clustering, user interest hierarchy, user profile 



1 4 Classification: Categ orizing in formation obje cts from user access patterns 
Mao Chen, Andrea LaPaugh, Jaswinder Pal Singh 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Full text available: ^| pdf(321.09 KB) Additional Information: full citation , abstract , references , index terms 

Many web sites have dynamic information objects whose topics change over time. 
Classifying these objects automatically and promptly is a challenging and important problem 
for site masters. Traditional content-based and link structure based classification techniques 
have intrinsic limitations for this task. This paper proposes a framework to classify an object 
into an existing category structure by analyzing the users' traversals in the category 
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structure. The key idea is to infer a ... 

Keywords: category structure, classification, dynamic object, multimedia, prediction, user 
accesses 

1 5 Theory of ke yblo ck-based ima g e retrieval Q 
April 2002 ACM Transactions on Information Systems (TOIS), volume 20 issue 2 

Full text available- f£l pdf( 2 14 MB). Additional Information: full citation , abstract , references , index terms . 

• ia| • review 

The success of text-based retrieval motivates us to investigate analogous techniques which 
can support the querying and browsing of image data. However, images differ significantly 
from text both syntactically and semantically in their mode of representing and expressing 
information. Thus, the generalization of information retrieval from the text domain to the 
image domain is non-trivial. This paper presents a framework for information retrieval in the 
image domain which supports content-based q ... 

Keywords: clustering, codebook, content-based image retrieval, keyblock 

1 6 Machine learning in automated t ext c ate g orization Q 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue 1 

Full text available: fBpdf( 524 .41 KB) Additional Information: full citation, abstract, references, citings, index 

• ... ^ terms 

The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning techniques: 
a general inductive process automatically builds a classifier by learning, from a set of 
preclassified documents, the characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 



17 Seg menta tion-b ased modelin g for advanced tar g eted marketing 

C. Apte, E. Bibelnieks, R. Natarajan, E. Pednault, F. Tipu, D. Campbell, B. Nelson 
August 2001 Proceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: fB pdf(477.55 KB) Addjtional Information: full citation , abstract , references , citings, index 
^ terms 

Fingerhut Business Intelligence (BI) has a long and successful history of building statistical 
models to predict consumer behavior. The models constructed are typically segmentation- 
based models in which the target audience is split into subpopulations (i.e., customer 
segments) and individually tailored statistical models are then developed for each segment. 
Such models are commonly employed in the direct-mail industry; however, segmentation is 
often performed on an ad-hoc basis without directly ... 

Keywords: Segmentation-based models, decision trees, feature selection, linear regression, 
logistic regression, targeted marketing 



18 Efficient clustering of hi g h-dimensional data sets with a p plication to reference matchin g Q 
Andrew McCallum, Kamal Nigam, Lyle H. Ungar 

August 2000 Proceedings of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(273.86 KB) Additional Information: full citation , references , citings , index terms 
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19 An interactive comic book presentation for exploring video 

John Boreczky, Andreas Girgensohn, Gene Golovchinsky, Shingo Uchihashi 
April 2000 Proceedings of the SIGCHI conference on Human factors in computing 
systems 

Full text available: flfi pdf (1.62MB) Additional Information: full citation, abstract , references , citings, index 
l£j terms 

This paper presents a method for generating compact pictorial summarizations of video. We 
developed a novel approach for selecting still images from a video suitable for summarizing 
the video and for providing entry points into it. Images are laid out in a compact, visually 
pleasing display reminiscent of a comic book or Japanese manga. Users can explore the 
video by interacting with the presented summary. Links from each keyframe start video 
playback and/or present additional detail. Caption ... 

Keywords: keyframe extraction, video browsing, video summarization 



20 Enhanced hypertext categorization usin g hy perlinks 
Soumen Chakrabarti, Byron Dom, Piotr Indyk 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data, volume 27 issue 2 
Full text available* HH odfd 91 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

A major challenge in indexing unstructured hypertext databases is to automatically extract 
meta-data that enables structured search using topic taxonomies, circumvents keyword 
ambiguity, and improves the quality of search and profile-based routing and filtering. 
Therefore, an accurate classifier is an essential component of a hypertext database. 
Hyperlinks pose new problems not addressed in the extensive text classification literature. 
Links clearly contain high-quality semantic clues that ... 
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